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Abstract \ ■ 



The reliabilities of six item b^as indices are investigated for each of the 

eleven tests of t^he Igwa Tests of gasic Skills , ^using. random sam\)les of 

■ .. • ■ '. - . , n 

<. . ' ■ 

* fifth grade s.tudents. Both racial and sexual bias "are considered. The / 
^ * • 

reliability of an index is defined here as its stability from one randomly 

equivalent group to another. The results indicate that the 'It^m^ bias indices, 

investigated are fairly unreliable when "based on sample sizes of 200 minority 

and 200 majority examinees. Cond^equently, this study 'suggests that the use 

" of Item bias indices to screen achievement test items cannot be expected to 

lead to consistent decisions about which items are biased with sample sizes 

of about 200. Additionally, correlations among bias indices are investigated. 
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The Reliability of Selected Item Bias Procedures 
\ The eli m ination of biased (cultural, racial, sexual, etc.) items from, 
achievement tests is oft^n cor^eptualized to be a two-stage' process* First, » 
"experts" judge the fairness of .the presentation format and content #f . the 
items for a variety of groups. Those items whiqh are judged to be unfair, or 
biased, are excluded from the test. Second, many researchers^ including 
Scheuneman (1979), have advocated the use of item bias statistics to screen 
test items prior to the construction of final test forms. Ideally, bias indices 
would be calculated from item tryout data. Based on these indices, biased 
items would be, excluded from the test in much the same Way that test items 
with low item discriminations^ are e:t?eluded in the item tryout stages obtest 
development. V i 

Item bias indices should pi^oduce stable results if they are to be used 
beneficially for screening purposes.* However, certain studies suggest that 
item bias statistics may be fairly unstable*. Studies by Plake (1980) and 
Quails -and Hoover {1981) suggested that the statistical bias indices are only 
minimally related to "Sxperts*" judgments of item bias. Scheuneman (1980) and 
Linn, Levine, Hastings and Wardrop C1981) found only modest agreement among 
item bias statistics across independent S2unples. Linn et al. (1981) con- 
cluded that . . it may be difficult to identify biased items because of the 
Unreliability of the indices used" (p. 170). 
" None of the previously conqpleted studies directly addressed the issue of 
the reliability (steUbility f^roHT^ne x?andomly equivalent group to another) of - 
item bias indices. For this reason, the reliabilities 6f each of six internal 
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criterion item bias indices were investigated in the present study » Indices 
were calculated for both race and sex categorizations for each of the eleven 
tests of the Iowa Tests of Basic SkiMs administered to fifth grade students. 
Only unsigned versions of. the indicies\^{Iroason & Subkoviak, 1979) were in- 
vestigated since item screening, as usually conceived, involved eldlminating 
items biased against any group, s The indices were based on samples of 200 
examinees from each race or sex categorization. These sample size^s were 
viewed as being the largest which typically would be available for minority 

students in most item tryout situations. Additionally, the relationships 

* 

among item bias indices were examined. 

No discussion of the differences among definitions of item bias or among 
item bias statistics will be presented here. These issues are digcuo^^^. in ^ 
a variety of sources including Hdnter (1975) , Ironson and Subkoviak (1979) , 
Lord (1980), Marascuilo and Slaughter (1981), Rudne^ , Getson, and .Knight (1^80a,b) ; 
and Shepard, Camill(i, and Averill (1981). . . 

Item Bias Indices 

Six different item bias indices were evaluated in this study. The 
difficulty and delta indices to be discussed wer^designed to detect group dif- 
ferences (e.g., between blacks and whites) in relative item difficulty. The 
biaorial and point bigerial indices were designed to detect group differences- 
in item discrimination. The ^cheuneroan and 3- parameter indices were designed 
to detect differences in relative item difficulty by score level and latent 
ability level, respectively. ^ 




Difficulty and Delta Indices - . 

The difficulty index was referred, to as the transformed item difficulties — 45° 

t 

line method l^y Rudner et al. (1980a), except that the absolute value of the 

*, p 
Rudner et al. , (1980a) index was used in the present study. For this index, 

J 

item difficulties (p-values) are calculated and standardized (mean of sero; 
standard deviation of one) within each groups. The difficulty index for an 
item is calculated as the absolute value of the difference between standardized 
item difficulty for the two race or sex groups. ^ 

The fl'^lt j index was referred to a's the transformed item difficulties — major 

axi«{3 index by Rudner et al. (1980a) with one substantive modification — tha 

if '..' • 

delta index is the absolute value* of the Rudner et al, <1980a) index. For 

It 

this index, the within group item difficulties ar^ transformed using the in- 
verse normal transformation. These transformed difficulties are then st^dard- 
ized (mean of zero; standard deviation of one) within groups. The delta in- 
dex for an item is the absolute difference between the standardized transformed 

difficultieo for the two groups. A ^imilar approach was used b,y Angoff and 

♦ 

Ford (1973) . ^ ^ . ' 

Biserial and Point Biserial Indices 

ThG biserial in(Jex for an item is the absolute difference between the^ 
within group biserial correlations of the item with total scorcj. The point 
bioerial index for an item is the absolute difference between the within -group 

point biserial corrxBlations of item with total score. ^ 

■ ■ ■ o . 

Scheuneman Index . 

The gchQuneman index (Scheuneman, 1979) was calculated for each item using 
five score levels. The score levels were defined such that approximately equal 



nunibers of f^exanilnees were in each level. According to Scheimeinan (1979), the 
in4ex could be expected to be distributed approximately chi-square with four 



degrees of freedoin. , ^ 

3 " Parameter Indesc 

The 3- paragieter index is a modification of the index proposed by Linn 
and Harnisch (1981)*. This index was chosen because it can be used with smaller^ 
sample sizes than the more widely recommended index suggested by Lord (1980) < 
For this index, first the item ^na ability paraiaeters of the three-parariater t 
logistic item response theory model* are Q*3timated for the combined group of 
examinees. Fo^ example, item responses for black and white students ars pooled 
in order to estimate the model par^oters. The two groups of examinees are 
then separated. For each examinee, the difference between the examinee's 
estimated probability (p) ®f correctly answering the item and the examinee *3 
actual response to the item (l-^correctj O^incorrect) is found. This quantity 
is thoa divj.ded by a Vtondard err6r~p (l--p) — ^ahd averaged over examinees within 
each group. The mean tot eaph group is then squared and the two squared means 
summed to arrive at the 3- parameter index. - 

Method 

The data consisted of item' responses by 800 fifth grade students who 
participated in the 1977 national standardization of the Iowa Tests of Basic 
Skills (ITBS). The sample included 200 black mal^a, 200 black females, 200. 
white males/ auid 200 white femalei with equal numbers of each of these groups 
randomly selected from individual schools .in the standardization Sample. Thus, 
the sample contained equal numbers of black and white pupil^-and was balanced 
by flex. In addition, the confounding of curriculum differences and ethnic 



group meiobGrshipf common to many item bias studies, was partially controlled. 
^All eleven tests from the XTBS«*%?ere analyzed. ^ 

The black ^students war e > raridomly divided, stratified by sex, into two 
gaudies of 200 students each. The same procedure was followed for white stu- 
dents. Item bias statistics were calcul^t^d for the first sample of |3lacl^ vs. 
the first sample of white students as well as for the second s^imple of black 
vs. ^tho second sample of -white students. The item bias indices were calculated 
separately for each of the eleven ITBS tests. Identical proqeduSres wore 
followed for the female vs. male comparisons except that the stratification 
in the random sampling was by race. 

The reliability of each item- bias index. was investigated by test for the 
race categorization as Well* as for the sex categorization. The correlation 
between the values of an item bias index across random samples was used as a 
measure of the reliability of the index. Additionally^f items were classified 
as either biased or unbiased using the difficulty, delta, and Scheuneman in- 

^, fca ■ ill'' ~ - 

dices. Items ^th difficulty or delta indices above 0.75 were classified as 
bjbaG^|fed by th^ft index on the suggestion of Rudner et al. (1989b). Items with 
Scheuneman \ndex values which surpassed the 0.05 critical value of a chi- 
square dist^'ibution with four degrees of freedom were classified as biased on 
the recomiiiendation of ScheiBieman (1979) . The agreement in classif ic^^tion of 

J 

items across random sanqprles by a given index was used as another method to 
investigate the reliabilities of each of these three item bias indices. 

The values of each item bias index were pooled over all of the items in 
the tost battery And the reXi«U3ility of each index and the intercorrelations 
among indices*— acrbss randomly equivalent samples — were estimated. Additionally, 
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disattenxiatod intorcorrolatidns were est'imated in order to investigato the 
relationships among item bias statistics in the presence of no estimation 
error . 

Results 

' .An attempt was made to estimate the three-parameter logistic item response 
model parameters using separate LOGIST (Wood, Wing^jraky; and Lord, 1978) runs 
for each rax\domly equivalent sample of 400 examinees. However, LOGIST failed 
to converge. Because of these convergence problems, the parameter estimation 
was completed using all 800 examinees. The S- parameter indices were calculated 
using th@Go parameter estimates following the same general ^Ji^cedures as were 
followed for the other indijcoo. The use of parameter estimates from the com- 
binod oainple results in a dependency between indices across randomly equiva- 
lent samples. Therefore, the reported reliabilities for the 3- parameter index 
are probably overestimates of the actual values of the. index. For this rea- 
son, the index was calculated only for the vocabulary and language usage tests 
of the ITBS. ' . • 

The moans and standard deviations of raw scores on ^each test are presented 

n 

in Table 1. The means and standard deviations wore generally larger for 



Insert Tables 1 and 2 about here 



whites than fpr blacks. Theroi aLeo appeaxod to be a tendency for the females 
in this san^le to oeurn slightly higher scores than the males. 

The reliabilities of item bias indices for the race comparison are pre- 

: U. 
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sented in Table 2. Very few of the reliabilities surpassdti tho .05 critical 
.value. The reliabilities wore generally in the very low to, at best, noderate 
range. The reliabilities for the language usage test were the only ones which 
were consistently moderate across indices. Overall, the difficulty and 
del^a indices tended to produce more reliable resulta than any of the other 
indices for the race compariaon. However, Hunter (1975) illustrates how mean 
differences between gjroups can lead to largo values of those bias otatistics, 
even when the item is not biased. Thus, the reliability of thes^ indices - 
may have Jjeen more of an artifact of tho substantial mean differences 
between blackfs and whites than reliability for detecting item bias, per so. 
Additionally, tho Scheimeman index tended to producQ the least reliable rosulto 

~! . ' . 

for tho race comparison. Also, note that for the vocabulary and language 
uoago toots, the S- paramoter index tendpd tq have a lower reliability than 

tho other indices. 

Tho roliabilitios of tho item biaa indices for tho oex comparioon are 
preoontod in Table 3. Tho reliabilities v^oro gorterally vory low. In fact,, 
there is littlo ovidonco to suggest that tho reliobilitioo for any index, 
except possibly tho Schounoman index, Voro above zero. 

Note that roliabilitios of signed indices are included in the Appendix 
for the oako of completonoflo. Tables corresponding to Tablos 2 and 3 are pro- 
vided. 



>lo^ 
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Insert Tables^ 3 and 4 about ]|ore 

.4. 



The intercorrolationo among it^m bias indices across all toots for the 
raco compajTiaon are shown in' Table 4. Tho diagonal ontrioo roprooont tho 
indices' reliaJbilitrios across tests. Those roliabilitiooj wore fairly low.^ 

10 ' ^ * 



The values above the diagi)nal represent the average i/itercorre la tiona among 
indices across samples. Fox example, the 0.29 value in the table represents 
the average of two correlations. The first was the cbrrelation between the 
difficulty index for the fi*«*st random *aamplQ and the delta index for the second 
random ample. The second correlation inclwded in the average was between 
the difficulty index for the ^ecohd a^andom sample and the delta index for the 
firi^t ^random sample. The values above the diagonal were used in combination 
with the reliabilities to arrive at the diqattenuated correlations presented below 
the diagonal in Table 4. , . • 

The disattenuated correlations strongly suggest that the difficulty and 
delta indices both reflect the same iten^ bias property and that the bioerial 
and point bisorial indices both reflect the same item bias property. The 
disattenuated correlations also strongly suggest that the difficulty and delta 
indi6oa reflect a very different item bias property than that reflected by 
the bioerial and point biserial indices. Additionally, the disattenuated correla- 
tions suggest that the Scheuneman index reflects properties reflected by both 
..the di£ficulty/dQlt;a indices and biserial/point biserial indices of item bias. 
Table 5 presents the intercorrela tions among bias indices for the sex 



InsortL Table 5 aljout here 



comparison. The reliabilities as well as the intorcorrelations among indices 
were negli<Jible, Disattenuated correlations are not presented as all of 
the roliabilltl'es in the table failed to surpass the .05 critic&l value. Over- 
all# the results lauggeated little ^r^no^consistency for the 3e\c comparison 
across random aamj^leSr for .any index* 

il 



' The numbers of items claasified as biased by the difficulty > delta , and 
Scheuneiitan indicso are prooented in Table .6 for the race comparison and in 
Table 7 for the oex comparison. The repu^lto presented suggest that there was 
minimal agreement acrooo randomly equivalent samples, at best. 

SI 
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Discussion 

1\q results suggested that the item bias indices investigated are faixly 

unreliabl© when based on sample si^es of 20O minority and 200 majority examinees 

The use of item bias indices to screen achievement test items for bias could 

not be expected to Iqad to consistent decisions about which items ^e biased 

I 



with those sample sizes. - \ 

One potential oxplanation^f^the instability of the indices is that few, 
if any, biased items are included on thd ITBS. In the ITBS test construction 
procedures, the content and presentation format of the tost items are evaluated 
for bias using "exports'" judgments. Perhaps, the use of the judgments of 
"experts" is sufficient to dgtoct biased items in achievement tests and the 

^ • \ ^ - 

itqpi.bias statlstica provide little additional information. If so, then it 



would bo more beneficial for tost constructors to use available resources to 
hire "exports" to screen items rather than to compute item bias indices. 

If the screening of items for bias using item bias indices is to pro- 
duce beneficial roaulta,' then research is needed to ascertain the sample sizes 



\ 
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necessary to produce sufficiently stable results The ^present studyj clearly 
showed that saic8^1e--ei«^ of 200 minority and ;2 00 majority ejtaminees are ti^o ^ 
small to allow for reliable^ecisions of.bias ba^ed on the bias indices that 
were investigated. - ' 
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Table 1^ . 

^ ■ ' a • ' 

M0an and Standard Deviation of Raw Scores by Race and Sex 





Nvunber of 


Race 




s 


ex 






_ . 2 4^ 


itemsf 

r i 


Blacks ^ 


Whites " 


Females 


Males 


uveraj.1 


Vocabulary 


39 


.13.,6L 


21. 


66 


17. 


87 


17.40 


17 


.63 






( 6.93) 


( 9. 


22) 


u 


65) 


( 9.51) 


( ^ 


• 09) 


^Heading * 


54 


17.^6 


26. 


30 


'22. 


23 


21.53 • 


21 


.88 


> ^: ^ 




( 7.13) 


Ul* 


02) 


( 9. 


81) 


* (10.72) 


(10 


.27) 


Spelling 




17;46 


22. 






02/ 


17.90 


19 


.96 






( 8.86) 


(9. 


28) 


(9. 


*6) 


( 9.05) 


( '9 


.33) 


Capitalization 


' 30 ^ 


12.31 


15. 


83 


; 15. 


00 


13.13 


' 14 


.07' 






( 4.695" 


( 5. 


72) 


< s. 


29) 


( 5.58) 


(5 


.51) 


Punctuation V 


30 


10.60 


14. 


70 


13. 


52 


11.78 


12 


.65 


■ ■ 1 


. * *• 


( 4.60) 


( 6. 


16) 


( 5. 


85) 


(^5.63) 


( 5 


.80) 


Language Usage ^ 


30 


9.60 


15. 


50 


. 13; 


15 


11.95 , 


, 12 






V 


< 4.74) 


( -6. 


75) 


( 6, 


44) 


( 6.58) 


{ 6 


•53) 


Visual Materials 


46 


• 15. '59 


21. 


84 


18. 


,65 


• 18.77 


18 


.71 






{ .5.29) 


( 7. 


51)" 


( 6. 


64) 


( 7.74) 


'( 7 


.21) 


Reference Materials 


45 


17.57 


23. 


82 • 


21. 


90 


19.4? 


20 


.69 






{ 7.21) 


( 9. 


71) 


■( 8. 


91) 


( 9.15)^ 


( 9 


.10) 


Math x:onc'epts 


37 


12.97 


17. 


71' 


15. 


£5 


15.03 


15 


.34 






( 5.30) 


( 


64) 


( 6. 


19) 


U 6.70) 


( 6 


.46) 


Math Problem Solving 


27" 


9.54 \ 


13. 


10 


11. 


27 


lli36 


11 


.32 






(-4.15) ' 


( 5. 


41) 


( 4. 


80) 


( 5.46) 


( 5 


.14) 


Math. Computatlori-- 


V 45 - 


19.85 


22. 


32 


22. 


24 


19.92 


21 


.08 






( 7.43) 


C 8. 


18) 


'■X 7. 


64) 


(8.01) 


( 7 


.91) 



c 



in' parentheses represent standard deviations. 
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• Table 2 \ ^ \ ; ' 



Reliability of^Item Bias Indices for Race 



Test 

<ft 


Number 






Bias 


Index 




of Items 


Point 

Difficulty J3elta Biserial Biserial 


a 

Scheuneman 3-Parajiieter 


Vocabulary 


39 


.38* 


.32* 


.22 


.43* 


.06 ' .25 


Reading ^ 


54 


.25 




.04 


.-18P 


-.16 


Spelling 


40 


.24 ' 


.21 


\04 


.08 


.24 


Capitalization 


30 


-.09' 


-.07 


.44* 


."47* 


1 .31 


Punctuation 


30 


.45* 


.35* 


.17 


.24 


' .26 


LanQuaae Usaae 


' 30 


.48* 


\ 

. . 55 * V 


.49^* , 


.64* 


.55* .36* . 


Visual 
Ddaterials 


46 


.41* ■ 


.24 


.07 


.18 


.04 


V ' 
Reference 

Materials 


45 . 


.01 


-.07 


.03 


.07 


.06 ' 


Math Concepts 


37 


.19 


.14 


.21 


.1.4 


-r.30 


Math Problem 
Solving ; 


27 


.13 


.08 


.29 


.37* 


.04 


Math 

Computation 




. -.06^ 


-.01 


•04 


.09 










— % 








Median 


V 


.24 


.19 , 


.17 


.18 


.06 ^- 



♦ p < .05 . 

* Index was compute^ only for those tests with values in this column. 
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Table 3 



Reliability of Item Bias Indices for Sex 









TQSt* 


Niimber^ 


w 




Bias 


Index ' 






Items 


. Point 

Difficulty Delta '^iserial Biserial 


Scheuneman 


a 

3-Parameter 


Vocabulary 


39 


.22 


.22 


.14 


.11 


.01 


.09 


Reading 


54 


.19 


.15 ! 


.08 


- 12 . 


.34* 




Spelling 


40 


-.i9 

0 


-.15 


-.23 


1-.23 


.00 ^ 


• 


Capitalization 


30 


-.21 


-.13 


-.14 


-.19 


1 p 




Pvuictuatipn 


30 


.23 


.19 


-.15 


-.11 


.11- 


0 


Xianguage Usage 


* 30 


-.li 


-.14 


- . 05 * 


-.03 . 


.^31 


-.16 


Visual 
Materials 


46 . 


,14 


.10 


!21 , 


.18 


.03 




Refkren6e ' 
Materials 


45 


--15 


* * 

-.16 


-.09. 


-.09 


.08. 

4 




Math Concepts 


37 


-.04 


-.10 


.22 ^ 


.22 


.37* 


■ X ' ■' 


Math Problem 
Solving 


27 


-.17 




.05 ^ 


• .or 


. -.02 




Math 

Computation 


45 


.10 


.04 


.00 


-.11 


.•38* 


\ 


Median ' * 




• -.04 


-.10 


.00 


-.03 


^ .11 





♦ p < .05. . • - 

^ Index was computed only for those tests with values in this column. 
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Table 4 



Cbrrelations Between Item Bias Indices Across All 
t Tests for Race 



Index 


Difficulty 


Delta 


Biserial 


Point 
Biserial 


Seheuneman 


Difficulty v 




} 

..^^.29* 


.00 


.oi 


.06 


Delta 


.99+^ 




^-...^^^^.02 


.03^ 


. .07 


Biserial 


.01 


.08 






.'11* 


Point Biserial 


.01 


.11 ' 


-99 




"^^^ .11* 


Seheuneman ^ 


.45 


.36 


.59 


.51 


15*^ 



* p < .05 

Note: Diagonal values are reliabilities across all tests. Values above 
the diagonal are average correlations between indices across all 
tests. Values below the diagonal are disattenuated correlations 
between 'indices across all tests. Correlations were based on 423 
V items. 
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Table 5 

Correla-tions Between Item Bias Indices" Across All 







Teats 


for Sel 






"f— ^ 












Point 




Index 


Difficulty 


Delta 


"Biserial . 


Biserial 


Scheuneiman 


Difficulty 




0.07 


0.00 


0.01 


0.01 


Delta 




J 0.07 


"^^--.^0.01 


0.03 


0.03 


Biserial 








--...,0.02 


0.08 


Point Biserial 








-..^0.02^ 


--.....^,^^0.07 


Schounexnan 

























Note: None of the correlations surpJassed the *05 critical value*. Diagonal 
values are reliabilities across tests. " Values above the diagonal , 
ar^ average correlations between indices across all tests. Correla- 
tions were based on 423 items. 




Table 6 

Number of Biased Items £br "Race 



19 







\ ' \ ' — ^ 






Index 




Numbez 


Difficulty j 


1 ■ Delta 


\ Scheuneman 


Too ^ 


OX 




Number of Biased Items 








Items 


— 1 

Sample Sample 
One Two 


Botb 

Samples 


^anqple Sample 
One Two 


Both 
Samples 


S2Utple Sainple 
One Two 


Both 
Samples 


Vocabulary 


39 


3 6 


0 


3 5 , 


0 


1 • 1 


0 


Reading 


54 


2 5 


0 


2 3 


0 


0 1 


0 


Spelling 


40 


1 5 


0 


1 ^ 4 


0 


0 0 




Capitalization 


30 


0 0 


0 


0 0 . 


0 . 


1 2 • 


0 


Punctuation 


30 


4 6 


1 


5 ^8 


2 


1 0 


0 , 


Language Uoage 


30 




2 j 


6 4 


2 




1 


Visual 
Materials 




8 7 


3 


' r 

5 4" 


1 




0 


4 

Reference 
Materials 


45 

1 


> 5 - 2 


1 


6 3 


1 




0 : 


Math Concepts 


37 


2 . 1 


0 


3 1 


0 


1 ' 0 


" 0 


Matn ProDiem 
solving - ij^^i 


, 27 




> 


2 3' 


b 


1 2 


0 


Math - V 




O 

•v 

0 * 0 


0 


0 '0 


0 


0 4 


0 ' 


Overall 


423 


33 39 
( 7.8%) (>9.2%) 


7 

(1.6%) 


33 35 
[ 7.8%) (8.3V) 


6 

(1.4%) 


6 12 
(1.4%) ( 2.6%) 


1 

(0.002%) 



Notes r i) Item* with difficulty or delta indices above 0.75 or SchoUneman in- 
dices Above the 0.05 critical level for a bhi-square distribution with 
4 degrees of freedom were classified as biased. 

11) Niimber of biased Items in both samples refelrs to the number of Items 
classified as biased In both sample one and In sample two. 



ERIC 



ill) Overall percentages of biased items are shown in parentheses. 

• 

iv) ' The agreement of cla«»if ication across samples was evaluated using 
chi-square teats of independence with Yates' correction. The statis- 
tics were 4.70 for dlfficiaty , 3.32 for delta , aiiid 0.66 for 
Sigheuneinan . Only the test for the difficulty ind^sx surpadaed the 
O.OS orltloal value of the chi-sqniare distribution with 1 degree t>f 
fraedon* ' ^ 

^ ^ ,^ ^ ^ ^.^^2J ^ ^ 
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Table 7 

Number of Biased Items for Sex 



2V— 





i 






1 








Index 










- 


1 

Mumbert 


Difficulty 


Delfa 


Scheuneman 


Test 


of 




Number of Biased Items 










Items 

I 


Sample Sample 
One Two 


Both 
Samples 


Sample 
^One 


Sample 
Two 


Both 
Samples 


San^le 
One 


Sample 
Two 


Both 
Samples 


Vocabulary 


39 


3 . 3 


1 - 


3 


2 


1 


V 


0 


0 


Reading 


54 




■ ^-^ 
1 


3 


2 


1 


0 


d 


0 


Spelling 


40 _ 

i 


1 2 


0 

i 


0 


2 


0 


0 


0 


0 


1 

Capitalization 


\ 

30 


0 0 


0 • 


0 


0 


0 


0 


1 


0 


Punctuation 




1 2 


0 


1 


2 


0 . 


1 


0 


0 


Language Usage 


30 


' 2 1 


0 


1 


1 


0 


0 • 


0 


0 


Visual 


46 


2 3 


■ 

1 


2 


3 


1 . 


0 


£, 


U 


Reference 

Ma ^ Q 1*"^ n 1 CI 


45 




0 


2 


0- 


0 


2 


0 


0, 


Math Cortcepta 


37 


0 X 


0 


1 


1 


0 


1 . 


0 


b 


Math Problem 
Solving 


27 


1 0 


0 


1 


0 


0 


0 

• 


0 


0 


Math 

Computation 


45 


0 0 


.0 


0 


0 


0 


2 






Overall 

1 . 


423 


15 15 

( 3.5% ) ( 3.5% ) 


3 

(0.7% ) 


14 

( 3.3% , 


13 

( 3.1« 


3 

[ 0.7% ) 


9 

(2.1% ) 


4 

( 0.9% ) 


0 

(0.0%) 



Notes: 1) Items with difficulty or delta indices above 0.75 or Scheuneman in- 
dices above the 0.05 critical level for a chi-square distributipn with 
4 degrees of freedom were classified as biased. 

ii) Number of biased items in both samples re f errs tp the number of items 
classified as biased in both sample one and in sample two. 

I ' 

Hi) Ovorall percentages of biased items are shovm in paranthesQS. 



iv) 

I 



ERIC 



The agreement of classification across samples was evaluated using 
chi-square tests of independence with Yates* correction. The statis- 
tics were 7,86 for difficulty , 10.72 for delta , and 4.11 for ' 
Scheuneman. Bach statistic surpassed the .05 critical value. How- 
ever # for the Scheuneman statistic tKiL«| occurred because less them 
chance agretMnt was observed. 
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Appendix 

For the seUce of con^leteness, the reliabilities of signed versions of / 
all but the Scheuneman index were calculated. The reliabilities for signed 
versions of the difficulty ^ delta, biserial , and ' point biserial indices were 
calculated as described in the paper except that tho absolute value of the 
difference was not taken. The si^ed S^ parameter index is the overall index 
described in Linn and Hamisah' (1981) . * 

Tables^ Al and A2 p'^sent the reliabilities. Table Al corresponds to 
Table 2 and Table A^ corresponds to Table 3' in the text. Although th^ signed 
indices have somewhat grater reliabilities than the unsigned indices, the 
reliabilities are still consistent with the conclusions stated in the text. ' 



Table Al \ 



Reliability of Signed Item Bias IndicQ3'^,^^*Rac^ 





1 — ' 




Number 
of . 

0 

Items 






Bias Index 






Test f 


Difficulty 


Delta 


^Biserial 


Poi n't" 

Biserial 


3- 


a 

-Parameter 


Vocabulary 


39 




-# 

• 46* 


.33* 


.52* 




-.21 


Reading 


54 


.52* 


.49* 


.11 


.23 






Spelling 


40 " 


. 34* 


.30 


.17 


.19 






Cap it'alizati^Dn 


30 


.28 


.35* 


.48* 


.60* 






Punctuation 

1 


30 


.721* 


.70* 


.23 


.29 






Language Usage * 




.63* 


.62* 


.44* 


,59* 




x. 34 


Visual » 
















Materials 


46 


.70* 


, .63* 


.08 


.18 






Reference 








* 








Materials 


45 


.23 


.15 


.01 . 


As 






Math Concepts 


, 37 ' 


.64* 


.60* 


.13 


.15 






Math» Problem 












V 




Solving 


27 


.55* 


.45* 


.44* 


.53* 






Math 
















Computation 


45 


.38 


.35 


-.02 


.04 






Median 




.55 ^ 


.46 


.17 


.23 





* p < .05 

^ Index was computed only for those tests with values in this column. 
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Reliability of Signed :|;tem Bias Indices for Sex 











MuTnhnif 

o£ 

Items 


Bias Index 


Test 


Dirficulty 


Delta 


Blserial 


Point 
Biserial 


3 -Parameter* 


^^ocabulATy 


39 


.38* 


,39* 






.00 


-.16 


Reading 


54 


.46* 


.45* 


1 

1.19 




.20 




Spelling 


40 


.16 


.09 


.25 




.21 




Capitalization 


30 


.11 


.07 


-.05 




^08 




Punc tuat ion • 


30 


.13 


.07 


-.17 




-.16 




Language Usage 
* 


30 


-.04 


-.03 


.08 




.08 


-.15 


Visual 
Materials 


46 


.46* 


.43* 


.01 . 




.06 




Reference 
Materials 


45 . 


.38* 


.37* 


.00 




.15 




Math concepts 


37 


.38* 


.36* 


.22 




.X8 




Math Problem 
Solving 


27 


.02 


.02 


-.11 




-•11 




Math 1 
Cbmputatioa 


45 


.38* 


.34* 


.21 




.22 




Median 




.38 
• 


.34 


.02 


.08 





♦ p < .05 
a 



Index was eomputad only for those tests with values in this column. 



